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Abstract 

We define two algorithms for propagating information in classification problems with pair¬ 
wise relationships. The algorithms are based on contraction maps and are related to non-linear 
diffusion and random walks on graphs. The approach is also related to message passing algo¬ 
rithms, including belief propagation and mean field methods. The algorithms we describe are 
guaranteed to converge on graphs with arbitrary topology. Moreover they always converge to 
a unique fixed point, independent of initialization. We prove that the fixed points of the algo¬ 
rithms under consideration define lower-bounds on the energy function and the max-marginals 
of a Markov random field. The theoretical results also illustrate a relationship between message 
passing algorithms and value iteration for an infinite horizon Markov decision process. We il¬ 
lustrate the practical application of the algorithms under study with numerical experiments in 
image restoration, stereo depth estimation and binary classification on a grid. 


1 Introduction 

In many classification problems there are relationships among a set of items to be classified. For 
example, in image reconstruction problems adjacent pixels are likely to belong to the same object 
or image segment. This leads to relationships between the labels of different pixels in an image. 
Energy minimization methods based on Markov random helds (MRF) address these problems in 
a common framework [H l23l 115) . Within this framework we introduce two new algorithms for 
classification with pairwise information. These algorithms are based on contraction maps and are 
related to non-linear diffusion and random walks on graphs. 

The setting under consideration is as follows. Let G = (U, E) be an undirected simple graph and 
L be a set of labels. A labeling of U is a function x : U —)• L assigning a label from L to each vertex 
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in V. Local information is modeled by a cost gi{a) for assigning label a to vertex i. Information 
on label compatibility for neighboring vertices is modeled by a cost hij{a,b) for assigning label a 
to vertex i and label b to vertex j. The cost for a labeling x is defined by an energy function, 

F{x) = '^gi{xi) + ^ hij{xi,Xj). (1) 

i&V 

In the context of MRFs the energy function dehnes a Gibbs distribution on random variables X 
associated with the vertices V, 

p(X = x) = ^ exp(-F(x)). (2) 

Minimizing the energy F(x) corresponds to maximizing p(X = x). This approach has been 
applied to a variety of problems in image processing and computer vision m- A classical example 
involves restoring corrupted images mM- In image restoration there is a grid of pixels and the 
problem is to estimate an intensity value for each pixel. To restore an image I one looks for 
an image J that is similar to I and is smooth almost everywhere. Similarity between I and J is 
defined by local costs at each pixel. The smoothness constraint is defined by pairwise costs between 
neighboring pixels in J. 

1.1 Basic Definitions and Overview of Results 

Let G = {V, E) be an undirected, simple, connected graph, with more than one vertex. For 
simplicity let V = {!,... ,n}. Let X(i) and d(i) denote respectively the set of neighbors and the 
degree of vertex i, 

x(i) = {j e V I {i,j} e Ej, d(i) = |iv(i)|. 

Let L be a set of labels. For each vertex i G F we have a non-negative cost for assigning label a 
to vertex i, denoted by gi{a). These costs capture local information about the label of each vertex. 
For each edge {i,j} € E we have a non-negative cost for assigning label a to vertex i and label b to 
vertex j, denoted equally by hij{a, b) or hji{b, a). These costs capture relationships between labels 
of neighboring vertices. 

gi : L ^ [0, oo) for z G F; 

hij,hji : —)• [ 0 , 00 ) for {i,j} G E with hij{a,b) = hji{b,a) 

Let X G denote a labeling of F with labels from L. A cost for x that takes into account 
both local information at each vertex and the pairwise relationships can be defined by an energy 
function E : LX —>■ M, 

F{x) = '^gi{xi) + ^ hij{xi,Xj). (3) 

i(zV 
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This leads to a natural optimization problem where we look for a labeling x with minimum energy. 

Throughout the paper we assume L is finite. The optimization problem defined by F is NP-hard 
even when \L\ = 2 as it can be used to solve the independent set problem on G. It can also be used 
to solve coloring with k colors when \L\ = k. The optimization problem can be solved in polynomial 
time using dynamic programming when G is a tree [2] . More generally dynamic programming leads 
to polynomial optimization algorithms when the graph G is chordal (triangulated) and has bounded 
tree-width. 

Min-sum (max-product) belief propagation [231 US] is a local message passing algorithm that 
is equivalent to dynamic programming when G is a tree. Both dynamic programming and belief 
propagation aggregate local costs by sequential propagation of information along the edges in E. 

For i £ V we define the value function /j : L —)• M, 

/j(r) = min F{x). (4) 

Xi=T 

In the context of MRFs the value functions are also known as max-marginals. The value functions 
are also what is computed by the dynamic programming and belief propagation algorithms for 
minimizing F when G is a tree. Each value function defines a cost for assigning a label to a vertex 
that takes into account the whole graph. If x* minimizes F{x) then x* minimizes and when 

/i(r) has a unique minimum we can minimize F{x) by selecting 

X- = argmin/j(r). (5) 

r 

A local belief is a function 7 : L —)■ M. A field of beliefs specifies a local belief for each vertex 
in V, and is an element of 

(M^)^ = {y, = ((^^,..., ip^) I : L M}. ( 6 ) 

We define two algorithms in terms of maps, 

T : (M^)^ ^ (M^)^, 

S : (M^)^ ^ (M^)^. 

The maps T and S are closely related. Both maps are contractions, but each of them has its 
own unique fixed point. Each of these maps can be used to define an algorithm to optimize F(x) 
based on fixed point iterations and local decisions. 

For z G {T, S'} we start from an initial field of beliefs (p^ and sequentially compute 

/+i = z(/). 

Both and converge to the unique fixed points of S and T respectively. After 

convergence to a fixed point (p (or a bounded number of iterations in practice) we select a labeling 
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X by selecting the label minimizing the belief at each vertex (breaking ties arbitrarily), 


Xi = argmin ifi (r). 


( 7 ) 


The algorithms we consider depend on parameters p G (0,1), g = 1 — p and weights Wij G [0,1] 
for each i and j G N{i). The weights from each vertex are constrained to sum to one, 


E 


Wij = 1, 


Vi G V. (8) 

j&N{i) 

These weights can be interpreted in terms of transition probabilities for a random walk on G. In a 
uniform random walk we have Wij = l/d(i). Non-uniform weights can be used to capture additional 
information about an underlying application. For example, in the case of stereo depth estimation 


(Section 5.2) we have used non-uniform weights that reflect color similarity between neighboring 
pixels. We note, however, that while we may interpret the results of the fixed point algorithms in 
terms of transition probabilities in a random walk, the algorithms we study are deterministic. 

The maps S and T we consider are defined as follows. 


Definition 1.1. 


{T^p)i{T) =pgi{T) + 


E 

j&N(i) 


P 

min -hij{T,Uj) + qwji(pj{uj) 

Uj£L Z 


{S(p)i{T) = pgi{T) + Wij min phij{T,Uj) + q(pj{uj) 

^^ Ui&L 

j£N{i) 


(9) 

( 10 ) 


The map dehned by T corresponds to a form of non-linear diffusion of beliefs along the edges of 
G. The map defined by S corresponds to value iteration for a Markov decision process (MDP) [3] 
defined by random walks on G. We show that both S and T are contractions. Let (p be the fixed 
point of T and (p be the fixed point of S. We show (p defines a lower bound on the energy function 
F, and that (p defines lower bounds on the value functions fi, 


g^iixi) < F{x), Mx G lY, 


iev 


0i(T) < fiir), 


yieV, reL. 


( 11 ) 


( 12 ) 


In Section we study the hxed point iteration algorithm defined by T and the relationship 
between (p and F. To the extent that Yliev approximates F{x) this justifies selecting a 
labeling x by minimizing (pi at each vertex. This approach is related to mean field methods and 
variational inference with the Gibbs distribution p{X = x) |23l I15j . 

In Section we study the algorithm defined by S and the relationship between (pi and fi. 
To the extent that p>i{T) approximates /i(r) this justihes selecting a labeling x by minimizing p>i 
at each vertex. We also show a connection between the fixed point (p and optimal policies of a 
Markov decision process. The process is defined in terms of random walks on G, with transition 
probabilities given by the weights Wij. 
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(a) Attractive relationships (b) Repulsive relationships 

Figure 1: The fixed points of T on two problems defined on the graph above. In this case L = {1, 2}. 
In both cases the local costs gi are all zero except for vertex 1 who has a preference towards label 2. 
In (a) the pairwise costs encourage neighboring vertices to take the same label. In (b) the pairwise 
costs encourage neighboring vertices to take different labels. 


1.2 Examples 

Figure shows two examples of fixed points of T when the graph G = {V,E) \s a, cycle with 5 
vertices. In this case we have a binary labeling problem L = {1,2}. The local costs are all zero 
except that vertex 1 has a preference for label 2. This is encoded by a cost for label 1, 


5i(l) = l, 

51 ( 2 ) = 0 , 

( 7 j(a) = 0, Vi / 1, a G L. 

In example (a) we have pairwise costs that encourage equal labels for neighboring vertices, 

a = b 
a b, 

In example (b) we have pairwise costs that encourage different labels for neighboring vertices, 


hij{a,b) = 


(13) 

(14) 

(15) 


(16) 


hij{a,b) = 


5 


a = b 
a b, 


(17) 










































Figure shows a graphical representation of the local costs for each vertex, and the value 
of (p, the fixed point of T, on each example. In (a) local selection of Xi minimizing (pi leads to 
X = (2, 2, 2, 2, 2). In (b) local selection of Xi minimizing (pi leads to x = (2,1,2, 2,1). In both 
examples the resulting labeling x is the global minimum of F{x). For these examples we used 
p = 0.1 and Wij = l/d(i). 

Of course in general local minimization of (p does not lead to a labeling minimizing F{x) and 
it would be interesting to characterize when this happens. 


1.3 Related Work 


For general graphs G, when the pairwise costs hij{a, b) define a metric over L there are polynomial 
time approximation algorithms for the optimization problem defined by F [T3] . In some important 
cases the optimization problem can be solved using graph cuts and maximum flow algorithms 
[la 0 0 ttz]. This includes in particular the case of MAP estimation for an Ising model with an 


external field m- 

The algorithms we study are closely related to message passing methods, in particular to min- 
sum (or equivalently max-product) belief propagation (BP) [23lll5]' When the graph G is a tree, BP 
converges and solves the optimization problem defined by F. Unfortunately BP is not guaranteed 
to converge and it can have multiple fixed points for general graphs. Some form of dampening can 
help BP converge in practice. The algorithms we study provide a simple alternative to min-sum 
belief propagation that is guaranteed to converge to a unique fixed point, regardless of initialization. 
The algorithms are also guaranteed to converge “quickly”. 

One approach for solving the optimization problem defined by F involves using a linear program 
(LP) relaxation. The optimization problem can be posed using a LP with a large number of 
constraints and relaxed to obtain a tractable LP over the local polytope m- Several message 
passing methods have been motivated in terms of this LP m- There are also recent methods 
which use message passing in the inner loop of an algorithm that converges to the optimal solution 
of the local polytope LP relaxation [201 ED- In Section |3.I| we characterize the fixed point of S 
using a different LP. 

The mean-field algorithm [231115] is an iterative method for approximating the Gibbs distribu¬ 
tion p{x) by a factored distribution q{x), 


<lix) = qi{xi). (18) 

iev 

The mean-field approach involves minimization of the KL divergence between p and q using fixed 
point iterations that repeatedly update the factors qi defining q. A drawback of the approach is 
that the fixed point is not unique and the method is sensitive to initialization. 

The algorithm defined by T is related to the mean-field method in the sense that the fixed 
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points of T appear to approximate F{x) by a function H{x) that is a sum of local terms, 

H{x) = '^(fi{xi). (19) 

iev 

We do not know, however, if there is a measure under which the resulting H{x) is an optimal 
approximation to F{x) within the class of functions dehned by a sum of local terms. 

2 Preliminaries 

The algorithms we study are efficient in the following sense. Let m = \E\ and k = \L\. Each 
iteration in the fixed point algorithm involves evaluating T or S. This can be done in 0{mk‘^) by 
“brute-force” evaluation of the expressions in Definition [m In many applications, including in 
image restoration and stereo matching, the pairwise cost hij has special structure that allows for 
faster computation using the techniques described in m- This leads to an 0{mk) algorithm for 
each iteration of the fixed point methods. Additionally, the algorithms are easily parallelizable. 

The fixed point algorithms defined by T and S converge quickly because the maps are contrac¬ 
tions in 

Let z : —)> and ||x|| be a norm in M^. For 7 G (0,1), z is a 7 -contraction if 

||z(x)-z(y)|| < 7 ||x-y||. ( 20 ) 

When z is a contraction it has a unique fixed point x. It also follows directly from the contraction 
property that fixed point iteration Xk = z{xk-i) converges to x quickly, 

Ikfc - x|| < 7 ^||xo - x||. ( 21 ) 

The weights Wij in the definition of T and S dehne a random process that generates random 
walks on G. We have a Markov chain with state space V. Starting from a vertex Qo we generate 
an infinite sequence of random vertices (Qo, Qi, ■ ■ ■) with transition probabilities 

p(Qt+i = JlQt = i) = Wij. (22) 

A natural choice for the weights is Wij = l/d(z), corresponding to moving from i to j with uniform 
probability over N(i). This choice leads to uniform random walks on G |18| . 

We consider in (M^)^ the partial order 

if <11) (pi(T) < ^{(t) Mi G E, Vr G L. (23) 

It follows trivially from the definitions of T and S that both maps preserve order in (M^)'^, 

ip < i() ^ Tp < Tif), Sp < Sij). (24) 
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We claim that for any a € 


^ Wjia,j = ^aj. (25) 

j^N{i) j^V 

This follows from re-ordering the double summation and the constraints that the weights out of 
each vertex sum to one, 

j^N(i) j^V iGN{j) j^V 

We note that the algorithms defined by T and S are related in the following sense. For a regular 
graph with degree d, if we let Wij = 1/d the maps T and S are equivalent up to rescaling if the 
costs in T and S are rescaled appropriately. 




3 Algorithm defined by T (Diffusion) 

In this section we study the fixed point algorithm dehned by T. We show that T is a contraction 
in (M^)^ and that the fixed point of T dehnes a “factored” lower bound on F. 

We start by showing that T is a contraction with respect to the norm on (M^)^ defined by 

IIV^Iloo,! = X] 

i£V 

Lemma 3.1. (Contraction) For any ip,ip G 

||(r(/})i - (rV')illoo < g ^ -V’illoo VieV, (27) 

j&N(i) 

\\{Tip) - (rV^)||oo,i < q\\^ - V'lloo,!- (28) 

Proof. Take i G V and t G L. For any x G 

{T(p)i{T) = pgiir) + V min f (r, uj) + qwjnpj{uj) 
jeNii) 

<P9i{T)+ ^hij{T,Xj) + qwjiipj{xj) 

j&N{i) 

< pgiir) + f ('T, Xj ) + qwji(i/j(xj) + \<fj[xj ) - t/j(xj )|) 

j&N(i) 

< pgiir) + ^ ^hijiT,Xj) + qwjii'4)jixj) + \\ipj-'ifjWoo) 

j&N{i) 

Since the inequality dehned by the hrst and last terms above holds for any x, it holds when x 
minimizes the last term. Therefore 

iT(p)iir) < iTiOiir) + q ^ Wji\\ipj - fjjWoo- 

j&N{i) 
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Since this inequality holds interchanging with if; we have 


\{T(p)i{T) - {T'ip)i{r)\ <q 

jeN(i) 


3 l|oo- 


Taking the r maximizing the left hand side proves (27). To prove (28), we sum the inequalities 
(27) for all i G V and use (25). □ 


The contraction property above implies the fixed point algorithm defined by T converges to 
a unique fixed point independent on initialization. It also implies the distance to the fixed point 
decreases quickly, and we can bound the distance to the fixed point using either the initial distance 
to the fixed point or the distance between consecutive iterates (a readily available measure). 

Theorem 3.2. The map T has a unique fixed point (p and for any p G (M^)^ and integer k > 0, 


- T'^tWoo,! < q \\t - T^lloo,!, 

||<^-‘^||oo.i < ^\\Tp - p\\^p. 


(29) 

(30) 


Proof. Existence and uniqueness of the fixed point, as well as the first inequality follows trivially 
from Lemma 3.1 To prove the second inequality observe that since T^p converges to p, 


|oo,l ^ 




|oo,l- 


k=0 


k=0 


Now note that since p G (0,1) and p + q = 1, 




k =0 


k=0 


P 


(31) 

(32) 

□ 


The map T and the energy function F are related as follows. 
Proposition 3.3. For any p G (M^)^ and x G 

'^{Tp)i{xi) < pF{x) + q'^pi{xi). 
iev iev 

Proof. Direct use of the definition of T yields 

y2{Tp)i{xi) = y^pgiixi) + mm ^hij{xi,Uj) + qwjipj{uj) 

iGV iGV j^N{i) 

- ^ ^hij{xi,Xj) + qwjipjixj) 

i€:V jGN{i) 


(33) 


P + 2^i3i^i^^3)] +(iYj Yj 

i^V j^N{i) J iGV j^N(i) 

■■ pF{x) + q E PjiXj), 

jev 
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where the last equality follows from the fact that hij{xi,Xj) = hji{xj,Xi) and Equation (25). □ 


Now we show the fixed point of T defines a lower bound on F in terms of a sum of local terms. 
Theorem 3.4. Let (p be the fixed point ofT and 


H{x) = y^^ipfixj). 
i£V 


Then 0 < (p and H{x) < F{x). 


Proof. The fact that H{x) < F{x) follows directly from Proposition 3.3 

To prove 0 < (^ consider the sequence (0, TO, T^O,...). The non-negativity of gi and hij implies 
0 < TO. Since T is order preserving (24) it follows by induction that T^O < T^+^0 for all A: > 0. 
Since the sequence is pointwise non-decreasing and converges to (p we have Q < (p. □ 


Theorem 3.4 allows us to compute both a lower and an upper bound on the optimal value of 
T, together with a solution where F attains the upper bound. 

Corollary 3.5. Let (p be the fixed point ofT and 


Xi = argmin (/?j(T) Mi G V, 

T 

then for any x* minimizing F, 


< F{x*) < Fix). 

i&V 

Proof. If X* is a minimizer of F, then the inequality Fix*) < Fix) holds trivially. We can use the 
definition of x to conclude that 


'^(piixi) < '^ipiix*) < Fix*), 

i£V i£V 


where the second inequality follows from Theorem 3.4 


□ 


3.1 Linear Programming Formulation 

Here we provide a LP characterizing for the fixed point of T. We note that the LP formulation 
described here is different from the standard LP relaxation for minimizing T(x) which involves the 
local polytope described in [22] . 

Consider the following LP which depends on a vector of coefficients a in 
T 

max a (p 
V 

Tiiui) < pgfiui) + ^2 + Q'^PTji^j) ^ V,\/u G L^. 
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Note that the constraints in the LP are equivalent lo (p < Tip. Next we show that this LP has a 
unique solution which equals the fixed point of T whenever every coefficient is positive, independent 
of their specific values. For example, (p is the optimal solution when a is the vector of ones. 

Theorem 3.6. If a is a non-negative vector the fixed point ofT is an optimal solution for the LP. 
If a is a positive vector the fixed point of T is the unique optimal solution for the LP. 

Proof. Let ip be the fixed point of T. First note that is a feasible solution since (p < Tip. 

Let ip £ be any feasible solution for the LP. The linear constraints are equivalent to 

p < Tp. Since T preserves order it follows by induction that T^p < T^~^^p for all A; > 0. Since the 
sequence {p,Tp,T‘^p,.. .) converges to p and it is pointwise non-decreasing we conclude p < p. 

If a is non-negative we have a'^p < fifip and therefore p must be an optimal solution for the LP. 
If a is positive and p p we have a^p < a^p. This proves the fixed point is the unique optimal 
solution for the LP. □ 

4 Algorithm defined by S (Optimal Control) 

In this section we study the algorithm defined by S. We start by showing that S corresponds to 
value iteration for an infinite horizon discounted Markov decision process (MDP) [3]. 

An infinite horizon discounted MDP is defined by a tuple {Q, A, c, t, 7 ) where Q is a set of states, 
A is a set of actions and 7 is a discount factor in M. The cost function c : Q x A —>■ M specifies a 
cost c(s, a) for taking action a on state s. The transition probabilities t: QxAxQ^R. specify 
the probability t(s, a, s') of moving to state s' if we take action a in state s. 

Let o be an infinite sequence of state and action pairs, o = ((si,ai), ( 52 , 02 ), ■ ■ ■) G {Q x A)°°. 
The (discounted) cost of o is 

00 

c(o) = ^ 7 ^c(sfe,afc). (34) 

k=0 

A policy for the MDP is defined by a map vr : Q —)• A, specifying an action to be taken at each 
state. The value of a state s under the policy tt is the expected cost of an infinite sequence of state 
and action pairs generated using vr starting at s, 

Ujr(s) = Fl[c(o)| 7 r, Si = s]. (35) 

An optimal policy vr* minimizes Vtt{s) for every starting state. Value iteration computes Vtt* as 
the fixed point of £ : —)■ M®, 

(Cv) (s) = min c(s, a) -|- 7 E t{s,a, s')v{s'). (36) 

“ s'eQ 

The map C is known to be a 7 -contraction [3] with respect to the || • ||oo norm. 
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Now we show that S is equivalent to value iteration for an MDP defined by random walks on 
G. Intuitively we have states defined by a vertex i € V and a label a € L. An action involves 
selecting a different label for each possible next vertex, and the next vertex is selected according 
to a random walk defined by the weights Wij. 


Lemma 4.1. Define an MDP (Q, A,c,t,'y) as follows. The states are pairs of vertices and labels 
Q = V X L. The actions specify a label for every possible next vertex A = LX. The discount factor 
is 'y = q. The transition probabilities and cost function are defined by 


t{{i,r),u, (j,r')) = 


Wij j £ N{i), t' = Uj 
0 otherwise 


c{{i,T),u) = pgfir) + ^ pwijhij{T,Uj) 
j&N{i) 

The map S is equivalent to value iteration for this MDP. That is, i/(/?j(r) = v{{i,T)) then 


(37) 

(38) 


{Sip)i{T) = {Cv){{i,T)). 


Proof. The result follows directly from the definition of the MDP, C and S. 


{Cv){{i,T)) 


min c((i, r), u) + 7 
ueL^ 


^ t((i,r),u, (j,r'))u(j,r') 


min pgfir) + 


pWijhij{T,Uj) +q ^ 
jeN(i) jeN{i) 


WijV{j, 


Uj) 


P9iX) + 


min phij{T, Uj) + qv{j, Uj) 

Uj GL 


= {ST)i{T) 


(39) 

(40) 

(41) 

(42) 

□ 


The relationship to value iteration shows S' is a contraction and we have the following results 
regarding fixed point iterations with S. 

Theorem 4.2. The map S has a unique fixed point f> and for any ip £ (M^)'^ and integer k>t), 


\ip-S'^ 

\\t- 


|oo ^ q 


< -IlSy?- 
P 


(43) 

(44) 


Proof. The first inequality follows directly from Lemma 4.1 and the fact that £ is a 7 -contraction 
with 'y = q. The proof of the second inequality is similar to the proof of the analogous result for 
the map T in Theorem 3.2 □ 
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4.1 Random Walks 


The formalism of MDPs is quite general, and encompasses the fixed point algorithm defined by S. 
In this section we further analyze this fixed point algorithm and provide an interpretation using 
one-dimensional problems defined by random walks on G. 

The weights Wij define a random process that generates infinite walks on G. Starting from some 
vertex in V we repeatedly move to a neighboring vertex, and the probability of moving from i € V 
to j G -/V(f) in one step is given by Wij. 

An infinite walk oj = (a;i,a;2,. ..) G V°° can be used to define an energy on an infinite sequence 
of labels 2; = (zi, Z 2 , ...) G L°°, 

00 

Fuj{z) = (45) 

t=0 

The energy Fi^{z) can be seen as the energy of a pairwise classification problem on a graph G' = 
(y', E') that is an infinite path, 

y' = {l,2,...}, (46) 

i?' = {{l,2},{2,3},...}. (47) 

The graph G' can be interpreted as a one-dimensional “unwrapping” of G along the walk w. This 
unwrapping defines a map from vertices in the path G' to vertices in G. 

Consider a policy -k :V x L x V ^ L that specifies Zk+i in terms of LOk, Zk and cok+i, 

Zk+i = 7i:{uik,Zk,uJk+i)- (48) 

Now consider the expected value of F^z) when cj is a random walk starting at i £ V and z is a 
sequence of labels defined by the policy vr starting with zi = r, 

v^(i,T) = E[Fyz)\uJi = i,zi= T,Zk+i = Tr{uJk,Zk,<^k+i)]- (49) 

There is an optimal policy vr* that minimizes v-n{i, r) for every i £ V and t £ L. Let ip be the fixed 
point of S. Then (pi(r) = u^*(f,r). This follows directly from the connection between S and the 
MDP described in the last section. 

4.2 Bounding the Value Functions of F 

Now we show that p defines lower bounds on the value functions (max-marginals) fi defined in Q. 
We start by showing that fi can be lower bounded by fj for j £ N{i). 

Proposition 4.3. Let i £ V and j £ N{i). 

fi{ui) > pgi{ui) + mm.phij{ui,Uj) + qfj{uj), 

Uj 

fi{ui) > pgi{ui) -F Wij minphij{ui, uj) + qfj{uj). 

* ^ Uj 

j£N{i) 
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(50) 

(51) 


Proof. The second inequality follows from the first one by taking a convex combination over j G 
N{i). To prove the first inequality note that, 


II 

Ti 

(52) 

= min min Fix) 

Uj^L 

Xi=Ui^Xj=Uj 

(53) 

= min min pF{x) + qF{x) 

Xi=Ui,Xj=Uj 

(54) 

> pgfiui) + min phij{ui,Uj) + min qF{x) 

Uj&L xeL^ 

OC'i —j — 

(55) 

> pgfixi) + min phij{ui,Uj) + min qF{x) 

Uj£L x&L'^ 

Xj =Uj 

(56) 

= pgfixi) + min phij{ui,Uj) + qfj{uj). 

Uj £L 

(57) 


The first inequality above follows from F{x) > gi{xi) + hij{xi,Xj) since all the terms in F{x) are 
non-negative. The second inequality follows from the fact that we are minimizing F(x) over x with 
fewer restrictions. □ 


The map S and the value functions are related as follows. 

Proposition 4.4. Let f = (/i,..., /jv) G (M^)^ be a field of beliefs defined by the value functions. 

Sf < f. (58) 

□ 


Proof. The result follows directly from Proposition 4.3 


Now we show that the fixed point of S defines lower bounds on the value functions. 
Theorem 4.5. Let (p be the fixed point of S. Then 


0 < < ffir). 

Proof. Since the costs and hij are non-negative we have 0 < 50. Using the fact that S preserves 
order we can conclude < (p. 

Since Sf < f and S preserves order, f < f for all k. To end the proof, take the limit fc —)> oo 
at the left hand-side of this inequality. □ 


5 Numerical Experiments 

In this section we illustrate the practical feasibility of the proposed algorithms with preliminary 
experiments in computer vision problems. We also evaluate the fixed point algorithms defined by 
S and T and other methods on random binary classification problems on a grid. 
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5.1 Image Restoration 


The goal of image restoration is to estimate a clean image z from a noisy, or corrupted, version 
y. A classical approach to solve this problem involves looking for a piecewise smooth image x that 
is similar to y |12t [6]. In the weak membrane model the local costs gi{a) penalize differences 
between x and y while the pairwise costs hij{a,b) penalize differences between neighboring pixels 
in X. In this setting, the graph G = {V, E) is a grid in which the vertices V correspond to pixels 
and the edges E connect neighboring pixels. The labels L are possible pixel values and a labeling 
X defines an image. For our experiments we use L = {0,..., 255} corresponding to the possible 
values in an 8-bit image. 

To restore y we define the energy F{x) using 

gi{xi) = {yi-Xif] (59) 

hij{xi,Xj) = Amin((xi — Xj)^,r). (60) 

The local cost gi{xi) encourages Xi to be similar to y*. The pairwise costs depend on two parameters 
A, r G M. The cost hij{xi,Xj) encourages Xi to be similar to Xj but also allows for large differences 
since the cost is bounded by r. The value of A controls the relative weight of the local and pairwise 
costs. Small values of A lead to images x that are very similar to the noisy image y, while large 
values of A lead to images x that are smoother. 

Figure shows an example result of image restoration using the algorithm defined by T. The 
example illustrates the algorithm is able to recover a clean image that is smooth almost everywhere 
while at the same time preserving sharp discontinuities at the boundaries of objects. For comparison 
we also show the results of belief propagation. In this example the noisy image y was obtained 
from a clean image z by adding independent noise to each pixel using a Gaussian distribution 
with standard deviation a = 20. The input image has 122 by 179 pixels. We used A = 0.05 and 
r = 100 to define the pairwise costs. For the algorithm defined by T we used uniform weights, 
Wij = l/d(i) and p = 0.001. Both the algorithms defined by T and belief propagation were run 
for 100 iterations. We based our implementations on the belief propagation code from [9], which 
provides efficient methods for handling truncated quadratic discontinuity costs. The algorithm 
defined by T took 16 seconds on a 1.6Ghz Intel Core i5 laptop computer while belief propagation 
took 18 seconds. 

The goal of restoration is to recover a clean image z. We evaluate the restored image x by 
computing the root mean squared error (RMSE) between x and z. We see in Figure]^ that when 
A = 0.05 and r = 100 the result of T has lower RMSE value compared to the result of BP, even 
though the result of T has significantly higher energy. We also evaluate the results of T, S and 
BP using different values of A in Table For all of these experiments we used r = 100 and ran 
each algorithm for 100 iterations. The minimum RMSE obtained by T and S is lower than the 
minimum RMSE obtained by BP considering different values for A, even though T and S alwayd 
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Original Image 


Noisy Image 


Output of T 
RMS error = 8.9 
Energy = 1519837 


Output of BP 
RMS error = 10.7 
Energy = 650296 


Eigure 2; Image restoration using the fixed point algorithm defined by T and BP. The algorithms 
were run for 100 iterations. 


find solutions that have higher energy compared to BP. This suggests the algorithms we propose do 
a good job aggregating local information using pairwise constraints, but the energy minimization 
problem defined by F(x) may not be the ideal formulation of the restoration problem. 


5.2 Stereo Depth Estimation 

In stereo matching we have two images Ii and Ir taken at the same time from different viewpoints. 
Most pixels in one image have a corresponding pixel in the other image, being the projection of the 
same three-dimensional point. The difference in the coordinates of corresponding pixels is called 
the disparity. We assume the images are rectified such that a pixel (x, y) in Ii matches a pixel 
{x — d, y) in Ir with d > 0. Eor rectified images the distance of a three-dimensional point to the 
image plane is inversely proportional to the disparity. 

In practice we consider the problem of labeling every pixel in Ii with an integer disparity in 
L = {0,..., D}. In this case a labeling x is a disparity map for Ii. The local costs gi{a) encourage 
pixels in to be matched to pixels of similar color in Ir- The pairwise costs hij{a,b) encourage 
piecewise smooth disparity maps. 

The model we used in our stereo experiment is defined by. 


gi{a) 


hij{a,b) 


min(7, ||/z(z) - - (a,0 ))||i); 

/ 

0 a = b, 

< a |a — 6| = 1, 

/? |a — 6| > 1. 


(61) 

(62) 
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T 

S 

BP 

A 

Energy 

RMSE 

Energy 

RMSE 

Energy 

RMSE 

0.01 

659210 

10.1 

842459 

9.1 

211646 

15.8 

0.02 

943508 

9.0 

1220572 

8.7 

337785 

13.4 

0.05 

1519837 

8.9 

1873560 

10.9 

650296 

10.7 

0.10 

2089415 

11.0 

2506230 

14.8 

1080976 

10.1 

0.20 

2700193 

14.8 

2942392 

17.7 

1730132 

12.9 


Table 1: Results of restoration using T, S and belief propagation (BP). The goal of restoration is 
to recover the original image z. We show the energy of the restored image x and the root mean 
squared error (RMSE) between x and z. We show the results of the different algorithms for different 
values of the parameter A. Both T and S obtain lower RMSE compared to BP even though BP 
generally obtains results with significantly lower energy. 

Here Ii{i) is the value of pixel i in while R(* ~ («) 0)) is the value of the corresponding pixel in 
Ir assuming a disparity a for i. The norm — Ir{i — (a,0))||i defines a distance between 

RGB values (matching pixels should have similar color). The color distance is truncated by 7 to 
allow for some large color differences which occur due to specular reflections and occlusions. The 
pairwise costs depend on two parameter a, /3 G M with a < (5. The pairwise costs encourage the 
disparity neighboring pixels to be similar or differ by 1 (to allow for slanted surfaces), but also 
allows for large discontinuities which occur at object boundaries. 

Eigurej^ shows an example result of disparity estimation using the fixed point algorithm defined 
by S'. In this example we used non-uniform weights Wij to emphasize the relationships between 
neighboring pixels of similar color, since those pixels are most likely to belong to the same ob¬ 
ject/surface. The parameters we used for the results in Figure]^ were defined by, 

Wij oc 0.01 (63) 

p = 0.0001, a = 500, = 1000 and 7 = 20. The input image has 384 by 288 pixels and the 

maximum disparity is D = 15. The fixed point algorithm was run for 1,000 iterations which took 
13 seconds on a laptop computer. 

We note that the results in Figure are similar to results obtained min-sum belief propagation 
shown in [9j. 

5.3 Binary Classification on a Grid 

Let G = (R, E) he a K hy K grid and L = {—1, +!}• We can define a classification problem on G 
using energy functions of the form, 

F{x) = '^aiXi+ ^ f3ijXiXj. (64) 

iev 
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Ground truth 


Result of S 


Figure 3: Stereo disparity estimation using the fixed point algorithm defined by S on the Tsukuba 
image pair. The algorithm was run for 1,000 iterations. 
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To evaluate different algorithms we generated random classification problems by independently 
selecting each at from a uniform distribution over [—1, +1] and each from a uniform distribution 
over [—A, +A]. The parameter A controls the relative strength of the pairwise relations and the local 
information associated with each vertex. 

Let X* be the global minimum of F{x). When iL = 10 we can compute x* using dynamic 
programming over the columns of G. In our experiments we quantify the quality of a potential 
solution X using two different measures: (1) the value of F{x), and (2) the Hamming distance 
between x and x*. Table shows the results of various algorithms we evaluated on 20 random 
problem instances. 

We compare the results of the following algorithms: 

DP Dynamic programming over the columns of G. Dynamic programming finds the global mini¬ 
mum of F{x) but has a runtime that is exponential in K. 

ICM Iterative conditional modes [5] . This is a simple local search technique that considers changing 
the label of a single vertex at a time. 

S The fixed point algorithm defined by S with p = O.OI and Wij = I/d(i). 

T The fixed point algorithm defined by T with p = 0.01 and Wij = I/d(i). 

BP Loopy belief propagation with a damping factor of 0.5. 

TRWS Sequential tree-reweighted message passing m- 

ADSAL Adaptive Diminishing Smoothing [2T]. This algorithm is based on the local polytope LP 
relaxation of optimization problem defined by F. 

All of the algorithms were implemented in C-|—1-. For BP, TRWS and ADSAL we used the imple¬ 
mentation available in OpenGM2 [T]. We also consider the result of applying local-search (ICM) 
to the output of each algorithm. All iterative algorithms were run for 1000 iterations. 

We found that post-processing solutions of different algorithms with local-search (ICM) often 
gives a substantial improvement. In particular the solutions found by the algorithms defined by S 
and T are not very good in terms of their energy value when compared to the alternatives. However, 
the solutions found by S and T improve in energy substantially after post-processing with ICM. We 
also found that lower energy solutions are sometimes further from the global minimum, in terms 
of Hamming distance, when compared to higher energy solutions. For example, on average, the 
energy of the solutions found by BP-I-ICM is lower the energy of the solutions found by T, but 
the solutions found by T are closer to the global minimum. After post-processing with ICM the 
solutions found by S and T and are often closer to the global optimum when compared to the 
alternatives. 
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A = 10 

Algorithm 

Energy 

Distance to x* 


Mean 

a 

Mean 

a 

DP 

-727.133222 

34.035102 

0.000000 

0.000000 

ICM 

-621.519900 

43.942530 

47.750000 

15.112495 

S 

-389.791776 

81.379435 

37.750000 

6.744442 

S + ICM 

-650.013870 

36.927557 

35.600000 

11.275637 

T 

-401.615127 

68.749218 

37.950000 

6.924413 

T -b ICM 

-650.204712 

40.796138 

36.050000 

10.883359 

BP 

-595.403707 

79.575241 

42.350000 

18.224366 

BP -b ICM 

-672.816406 

46.895083 

42.150000 

18.850133 

TRWS 

-660.259499 

37.781695 

42.650000 

20.857313 

TRWS + ICM 

-689.407548 

39.445723 

41.100000 

22.666936 

ADSAL 

-667.080187 

36.400576 

35.700000 

19.657314 

ADSAL -b ICM 

-693.317195 

36.631941 

34.650000 

21.036338 


Table 2: Comparison of different inference algorithms on random 10x10 binary grid labeling prob¬ 
lems when A = 10. In practice we see that some algorithms like BP yield lower energy solutions 
compared to S' or T but those solutions are often further from the global minimum. 

In terms of running time the algorithms defined by S and T are more efficient than the alterna¬ 
tives. Each iteration using S or T requires updating a belief for each vertex of G. In contrast, each 
BP iteration requires updating two messages for each edge of G. Other message passing methods 
like TRWS are similar to BP. Both S and T require less memory than these other algorithms 
because they work with beliefs associated with vertices instead of messages associated with edges. 
Moreover, we find that the fixed point algorithms defined by S and T converge faster than the 
alternatives. Table compares the results of different algorithms when using a limited number of 
iterations. The ADSAL algorithm solves a sequence of problems using TRWS as a subroutine. In 
this case the number of iterations refers to the iterations of the TRWS subroutine. 

6 Conclusion and Future Work 

The experimental results in the last section illustrate the practical feasibility of the algorithms 
under study. Our theoretical results prove these algorithms are guaranteed to converge to unique 
fixed points on graphs with arbitrary topology and with arbitrary pairwise relationships. This 
includes the case of repulsive interactions which often leads to convergence problems for message 
passing methods. 

Our results can be extended to other contraction maps similar to T and S and alternative 
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Algorithm 

100 Iterations 

1000 Iterations 

S + ICM 

35.6 

35.6 

T -b ICM 

36.0 

36.0 

BP -b ICM 

45.9 

42.1 

TRWS + ICM 

41.6 

41.1 

ADSAL -b ICM 

37.3 

34.6 


Table 3: Comparison of different algorithms when using a limited number of iterations, in terms of 
the mean Hamming distance between the solution found and the global optimum x*. 


methods for computing the fixed points of these maps. Some specific directions for future work are 
as follows. 

1. Asynchronous updates. It is possible to define algorithms that update the beliefs of a single 
vertex at a time in any order. As long as all vertices are updated infinitely many times, the 
resulting algorithms converge to the same fixed point as the parallel update methods examined 
in this work. We conjecture that in a sequential computation, the sequential update of vertices 
in a “sweep” would converge faster than a “parallel” update. Moreover, after a sequential 
update of all vertices, the neighbors of those vertices with greater change should be the first 
ones to be updated in the next “sweep”. 

2. Non-backtracking random walks. The algorithms defined by S and T can be understood in 
terms of random walks on G. It is possible to define alternative algorithms based on non¬ 
backtracking random walks. In particular, starting with the MDP in Sectionj^we can increase 
the state-space Q to keep track of the last vertex visited in the walk and define transition 
probabilities that avoid the previous vertex when selecting the next one. The resulting value 
iteration algorithm becomes very similar to belief propagation and other message passing 
methods that involve messages defined on the edges of G. 
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